1 Introduction

The primary objective of this project is to identify and visualize meaningful trends in US car accidents from 2016 to 2023. Our goal is to uncover when accidents occur most frequently, whether by time of day, day of the week, or month of the year, and determine if specific holidays are associated with increased accident rates. We also aim to explore geographic trends by identifying which states experience the highest and lowest number of accidents and assessing whether environmental factors such as weather, visibility, or road conditions contribute to accident severity. Additionally, we will examine long term trends in accident frequency to understand how they have changed over the years. By presenting our findings through a series of targeted visualizations, we hope to provide insights that could be valuable for public safety efforts, transportation planning, or future academic research.

2 Dataset Description

This analysis uses the U.S. Accidents (2016–2023) dataset compiled by Sobhan Moosavi, which is publicly available on Kaggle. The dataset contains over 7.5 million records of traffic accidents that occurred in the United States between February 2016 and March 2023.

2.1 Source

2.2 Description

Each row in the dataset represents a single traffic accident and contains information collected from traffic cameras, sensors, police reports, and other public sources. The data includes: - Timestamp and location - Weather conditions - Traffic and visibility indicators - Accident severity (rated 1 to 4)

2.3 Key Features Used in This Analysis

The following variables were selected or engineered for this project: - Severity: Level of accident seriousness (1 = least severe, 4 = most severe) - Start_Time: Timestamp of when the accident began - Temperature(F), Precipitation(in), Wind_Chill(F), Visibility(mi): Weather-related variables - Weather_Condition: Categorical weather label (e.g., Clear, Rain, Fog) - State: Abbreviation of the U.S. state - date_, hour, month, day_of_week: Time-based features derived from Start_Time - holiday_specific: Boolean indicator for U.S. holidays (e.g., Memorial Day, Christmas)

These features were used to explore patterns in accident frequency and severity across time, weather, and holidays.

3 Descriptive Analysis

##       ID               Source             Severity    
##  Length:7546771     Length:7546771     Min.   :1.000  
##  Class :character   Class :character   1st Qu.:2.000  
##  Mode  :character   Mode  :character   Median :2.000  
##                                        Mean   :2.212  
##                                        3rd Qu.:2.000  
##                                        Max.   :4.000  
##                                                       
##    Start_Time                        End_Time                     
##  Min.   :2016-01-14 20:18:33.00   Min.   :2016-02-08 06:37:08.00  
##  1st Qu.:2018-11-20 16:22:02.00   1st Qu.:2018-11-20 17:22:44.50  
##  Median :2020-11-10 08:23:39.00   Median :2020-11-10 15:11:14.00  
##  Mean   :2020-06-02 04:07:56.43   Mean   :2020-06-02 11:34:12.32  
##  3rd Qu.:2022-01-19 08:15:20.50   3rd Qu.:2022-01-19 19:01:21.00  
##  Max.   :2023-03-31 23:30:00.00   Max.   :2023-03-31 23:59:00.00  
##                                                                   
##    Start_Lat       Start_Lng          End_Lat           End_Lng       
##  Min.   :24.55   Min.   :-124.62   Min.   :25        Min.   :-125     
##  1st Qu.:33.38   1st Qu.:-117.22   1st Qu.:33        1st Qu.:-118     
##  Median :35.80   Median : -87.81   Median :36        Median : -88     
##  Mean   :36.19   Mean   : -94.71   Mean   :36        Mean   : -96     
##  3rd Qu.:40.11   3rd Qu.: -80.38   3rd Qu.:40        3rd Qu.: -80     
##  Max.   :49.00   Max.   : -67.11   Max.   :49        Max.   : -67     
##                                    NA's   :3341777   NA's   :3341777  
##   Distance(mi)     Description           Street              City          
##  Min.   :  0.000   Length:7546771     Length:7546771     Length:7546771    
##  1st Qu.:  0.000   Class :character   Class :character   Class :character  
##  Median :  0.028   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :  0.558                                                           
##  3rd Qu.:  0.460                                                           
##  Max.   :441.750                                                           
##                                                                            
##     County             State             Zipcode            Country         
##  Length:7546771     Length:7546771     Length:7546771     Length:7546771    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    Timezone         Airport_Code       Weather_Timestamp               
##  Length:7546771     Length:7546771     Min.   :2016-01-14 19:51:00.00  
##  Class :character   Class :character   1st Qu.:2018-11-20 16:15:00.00  
##  Mode  :character   Mode  :character   Median :2020-11-10 08:30:00.00  
##                                        Mean   :2020-06-02 04:08:26.56  
##                                        3rd Qu.:2022-01-19 07:58:00.00  
##                                        Max.   :2023-03-31 23:53:00.00  
##                                                                        
##  Temperature(F)   Wind_Chill(F)      Humidity(%)      Pressure(in)  
##  Min.   :-58.00   Min.   :-80.0     Min.   :  1.00   Min.   : 0.00  
##  1st Qu.: 49.00   1st Qu.: 43.0     1st Qu.: 48.00   1st Qu.:29.37  
##  Median : 64.00   Median : 62.0     Median : 67.00   Median :29.86  
##  Mean   : 61.67   Mean   : 58.3     Mean   : 64.84   Mean   :29.54  
##  3rd Qu.: 76.00   3rd Qu.: 75.0     3rd Qu.: 84.00   3rd Qu.:30.03  
##  Max.   :129.20   Max.   :128.0     Max.   :100.00   Max.   :58.63  
##                   NA's   :1833858   NA's   :10278    NA's   :7904   
##  Visibility(mi)   Wind_Direction     Wind_Speed(mph)  Precipitation(in)
##  Min.   :  0.00   Length:7546771     Min.   :   0.0   Min.   : 0.0     
##  1st Qu.: 10.00   Class :character   1st Qu.:   4.6   1st Qu.: 0.0     
##  Median : 10.00   Mode  :character   Median :   7.0   Median : 0.0     
##  Mean   :  9.09                      Mean   :   7.7   Mean   : 0.0     
##  3rd Qu.: 10.00                      3rd Qu.:  10.4   3rd Qu.: 0.0     
##  Max.   :140.00                      Max.   :1087.0   Max.   :36.5     
##  NA's   :39499                       NA's   :429617   NA's   :2064937  
##  Weather_Condition   Amenity           Bump          Crossing      
##  Length:7546771     Mode :logical   Mode :logical   Mode :logical  
##  Class :character   FALSE:7453817   FALSE:7543317   FALSE:6688257  
##  Mode  :character   TRUE :92954     TRUE :3454      TRUE :858514   
##                                                                    
##                                                                    
##                                                                    
##                                                                    
##   Give_Way        Junction        No_Exit         Railway       
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
##  FALSE:7511266   FALSE:6990004   FALSE:7527521   FALSE:7481985  
##  TRUE :35505     TRUE :556767    TRUE :19250     TRUE :64786    
##                                                                 
##                                                                 
##                                                                 
##                                                                 
##  Roundabout       Station           Stop         Traffic_Calming
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
##  FALSE:7546527   FALSE:7348460   FALSE:7337406   FALSE:7539342  
##  TRUE :244       TRUE :198311    TRUE :209365    TRUE :7429     
##                                                                 
##                                                                 
##                                                                 
##                                                                 
##  Traffic_Signal  Turning_Loop    Sunrise_Sunset     Civil_Twilight    
##  Mode :logical   Mode :logical   Length:7546771     Length:7546771    
##  FALSE:6424853   FALSE:7546771   Class :character   Class :character  
##  TRUE :1121918                   Mode  :character   Mode  :character  
##                                                                       
##                                                                       
##                                                                       
##                                                                       
##  Nautical_Twilight  Astronomical_Twilight     date_                 year     
##  Length:7546771     Length:7546771        Min.   :2016-01-14   Min.   :2016  
##  Class :character   Class :character      1st Qu.:2018-11-20   1st Qu.:2018  
##  Mode  :character   Mode  :character      Median :2020-11-10   Median :2020  
##                                           Mean   :2020-06-01   Mean   :2020  
##                                           3rd Qu.:2022-01-19   3rd Qu.:2022  
##                                           Max.   :2023-03-31   Max.   :2023  
##                                                                              
##      month           hour       precipitation      any_precip     
##  Min.   : 1.0   Min.   : 0.00   Min.   : 0.00000   Mode :logical  
##  1st Qu.: 3.0   1st Qu.: 8.00   1st Qu.: 0.00000   FALSE:7016077  
##  Median : 7.0   Median :13.00   Median : 0.00000   TRUE :530694   
##  Mean   : 6.7   Mean   :12.33   Mean   : 0.00613                  
##  3rd Qu.:10.0   3rd Qu.:17.00   3rd Qu.: 0.00000                  
##  Max.   :12.0   Max.   :23.00   Max.   :36.47000                  
##                                                                   
##    weather           temperature       visibility       wind_chill     
##  Length:7546771     Min.   :-58.00   Min.   :  0.00   Min.   :-80.0    
##  Class :character   1st Qu.: 49.00   1st Qu.: 10.00   1st Qu.: 43.0    
##  Mode  :character   Median : 64.00   Median : 10.00   Median : 62.0    
##                     Mean   : 61.67   Mean   :  9.09   Mean   : 58.3    
##                     3rd Qu.: 76.00   3rd Qu.: 10.00   3rd Qu.: 75.0    
##                     Max.   :129.20   Max.   :140.00   Max.   :128.0    
##                                      NA's   :39499    NA's   :1833858  
##      sevg            state_name       
##  Length:7546771     Length:7546771    
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##                                       
## 
## [1] 7546771      58
## tibble [7,546,771 × 58] (S3: tbl_df/tbl/data.frame)
##  $ ID                   : chr [1:7546771] "A-1" "A-2" "A-3" "A-4" ...
##  $ Source               : chr [1:7546771] "Source2" "Source2" "Source2" "Source2" ...
##  $ Severity             : num [1:7546771] 3 2 2 3 2 3 2 3 2 3 ...
##  $ Start_Time           : POSIXct[1:7546771], format: "2016-02-08 05:46:00" "2016-02-08 06:07:59" ...
##  $ End_Time             : POSIXct[1:7546771], format: "2016-02-08 11:00:00" "2016-02-08 06:37:59" ...
##  $ Start_Lat            : num [1:7546771] 39.9 39.9 39.1 39.7 39.6 ...
##  $ Start_Lng            : num [1:7546771] -84.1 -82.8 -84 -84.2 -84.2 ...
##  $ End_Lat              : num [1:7546771] NA NA NA NA NA NA NA NA NA NA ...
##  $ End_Lng              : num [1:7546771] NA NA NA NA NA NA NA NA NA NA ...
##  $ Distance(mi)         : num [1:7546771] 0.01 0.01 0.01 0.01 0.01 0.01 0 0.01 0 0.01 ...
##  $ Description          : chr [1:7546771] "Right lane blocked due to accident on I-70 Eastbound at Exit 41 OH-235 State Route 4." "Accident on Brice Rd at Tussing Rd. Expect delays." "Accident on OH-32 State Route 32 Westbound at Dela Palma Rd. Expect delays." "Accident on I-75 Southbound at Exits 52 52B US-35. Expect delays." ...
##  $ Street               : chr [1:7546771] "I-70 E" "Brice Rd" "State Route 32" "I-75 S" ...
##  $ City                 : chr [1:7546771] "Dayton" "Reynoldsburg" "Williamsburg" "Dayton" ...
##  $ County               : chr [1:7546771] "Montgomery" "Franklin" "Clermont" "Montgomery" ...
##  $ State                : chr [1:7546771] "OH" "OH" "OH" "OH" ...
##  $ Zipcode              : chr [1:7546771] "45424" "43068-3402" "45176" "45417" ...
##  $ Country              : chr [1:7546771] "US" "US" "US" "US" ...
##  $ Timezone             : chr [1:7546771] "US/Eastern" "US/Eastern" "US/Eastern" "US/Eastern" ...
##  $ Airport_Code         : chr [1:7546771] "KFFO" "KCMH" "KI69" "KDAY" ...
##  $ Weather_Timestamp    : POSIXct[1:7546771], format: "2016-02-08 05:58:00" "2016-02-08 05:51:00" ...
##  $ Temperature(F)       : num [1:7546771] 36.9 37.9 36 35.1 36 37.9 34 34 33.3 37.4 ...
##  $ Wind_Chill(F)        : num [1:7546771] NA NA 33.3 31 33.3 35.5 31 31 NA 33.8 ...
##  $ Humidity(%)          : num [1:7546771] 91 100 100 96 89 97 100 100 99 100 ...
##  $ Pressure(in)         : num [1:7546771] 29.7 29.6 29.7 29.6 29.6 ...
##  $ Visibility(mi)       : num [1:7546771] 10 10 10 9 6 7 7 7 5 3 ...
##  $ Wind_Direction       : chr [1:7546771] "Calm" "Calm" "SW" "SW" ...
##  $ Wind_Speed(mph)      : num [1:7546771] NA NA 3.5 4.6 3.5 3.5 3.5 3.5 1.2 4.6 ...
##  $ Precipitation(in)    : num [1:7546771] 0.02 0 NA NA NA 0.03 NA NA NA 0.02 ...
##  $ Weather_Condition    : chr [1:7546771] "Light Rain" "Light Rain" "Overcast" "Mostly Cloudy" ...
##  $ Amenity              : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Bump                 : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Crossing             : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Give_Way             : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Junction             : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ No_Exit              : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Railway              : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Roundabout           : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Station              : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Stop                 : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Traffic_Calming      : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Traffic_Signal       : logi [1:7546771] FALSE FALSE TRUE FALSE TRUE FALSE ...
##  $ Turning_Loop         : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Sunrise_Sunset       : chr [1:7546771] "Night" "Night" "Night" "Night" ...
##  $ Civil_Twilight       : chr [1:7546771] "Night" "Night" "Night" "Day" ...
##  $ Nautical_Twilight    : chr [1:7546771] "Night" "Night" "Day" "Day" ...
##  $ Astronomical_Twilight: chr [1:7546771] "Night" "Day" "Day" "Day" ...
##  $ date_                : Date[1:7546771], format: "2016-02-08" "2016-02-08" ...
##  $ year                 : num [1:7546771] 2016 2016 2016 2016 2016 ...
##  $ month                : num [1:7546771] 2 2 2 2 2 2 2 2 2 2 ...
##  $ hour                 : int [1:7546771] 5 6 6 7 7 7 7 7 8 8 ...
##  $ precipitation        : num [1:7546771] 0.02 0 0 0 0 0.03 0 0 0 0.02 ...
##  $ any_precip           : logi [1:7546771] TRUE FALSE FALSE FALSE FALSE TRUE ...
##  $ weather              : chr [1:7546771] "Light Rain" "Light Rain" "Overcast" "Mostly Cloudy" ...
##  $ temperature          : num [1:7546771] 36.9 37.9 36 35.1 36 37.9 34 34 33.3 37.4 ...
##  $ visibility           : num [1:7546771] 10 10 10 9 6 7 7 7 5 3 ...
##  $ wind_chill           : num [1:7546771] NA NA 33.3 31 33.3 35.5 31 31 NA 33.8 ...
##  $ sevg                 : chr [1:7546771] "more severe" "less severe" "less severe" "more severe" ...
##  $ state_name           : chr [1:7546771] "Ohio" "Ohio" "Ohio" "Ohio" ...

The raw dataset contains 7,728,394 observations (rows) of 46 variables (columns).

After data preparation and cleaning, the dataset contains 7,546,771 observations (rows) of 58 variables (columns).

Severity Number of Accidents
least severe 66121
less severe 6010987
more severe 1272321
most severe 197342

The author defines severity as “the impact on traffic.” Low severity accidents would have a minimal effect on traffic whereas high severity accidents would have a significant impact on traffic.

We can observe that the majority of accidents that took place between 2016 and 2023 were categorized as “less severe,” accounting for 6,010,987 of the total 7,546,771 accidents.

3.4 Statistical Analysis

3.4.1 Correlation Analysis of Key Quantitative Features

The heatmap shows the correlation between quantitative features such as temperature, wind chill, visibility, precipitation, and severity. Temperature and wind chill were nearly perfectly correlated (\(r = 0.99\)), as expected. However, severity had only weak correlations with all other variables, suggesting that accident severity is influenced by additional factors beyond those measured here.

3.4.2 ANOVA on Accident Severity by Weather Condition

A one-way ANOVA was conducted to examine whether accident severity differs by weather condition. The results showed a statistically significant effect of weather on accident severity, \(F(4, 1,\!814,\!823) = 18,\!549\), \(p < .001\), indicating that the average severity of accidents varies across different weather conditions.

##                  Df Sum Sq Mean Sq F value Pr(>F)    
## weather           4  18624    4656   18549 <2e-16 ***
## Residuals   1814823 455533       0                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.4.3 T-Tests on Severity and Frequency for Holidays

3.4.3.1 Severity on Specific Holidays T-Test

A Welch two-sample t-test was conducted to compare accident severity on specific holidays versus other days. The results showed a statistically significant difference in severity scores, \(t(93,\!469) = 2.50\), \(p = .0125\). The average severity on non-holidays (\(M = 2.212\)) was slightly higher than on holidays (\(M = 2.208\)), with a 95% confidence interval for the difference in means ranging from 0.0009 to 0.0073.

3.4.3.2 Frequency on Specific Holidays T-Test

A Welch two-sample t-test was also conducted to examine differences in the average number of accidents per day on holidays versus non-holidays. The results were statistically significant, \(t(43.04) = 3.27\), \(p = .0021\). The mean number of accidents per day was higher on non-holidays (\(M = 2,\!947\)) compared to holidays (\(M = 2,\!173\)), with a 95% confidence interval for the difference in means ranging from 297 to 1,!250.

## [1] "T-test on Severity (Specific Holidays):"
## 
##  Welch Two Sample t-test
## 
## data:  Severity by holiday_specific
## t = 2.4975, df = 93469, p-value = 0.01251
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
##  0.0008838382 0.0073295171
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##            2.212178            2.208071
## [1] "T-test on Frequency (Specific Holidays):"
## 
##  Welch Two Sample t-test
## 
## data:  n_acc by holiday_specific
## t = 3.2727, df = 43.041, p-value = 0.002105
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
##   296.8096 1249.9020
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##            2946.832            2173.476

3.4.3.3 Visualization: Severity on Holidays

Although the difference is small, the chart shows a slightly higher average severity for accidents on non-holidays compared to holidays. The mean severity was 2.212 on non-holidays and 2.208 on holidays. The corresponding Welch t-test (\(t(93,\!469) = 2.50\), \(p = .0125\)) confirms that this difference is statistically significant, although not practically large. This suggests that while there are fewer accidents on holidays, they are not necessarily more or less severe.

3.4.3.4 Visualization: Frequency of Accidents on Holidays

The bar chart clearly shows that the average number of accidents per day is significantly lower on specific holidays compared to non-holiday dates. On average, there were around 2,173 accidents per day on holidays versus 2,947 on non-holidays. This visual supports the results of the Welch two-sample t-test (\(t(43.04) = 3.27\), \(p = .0021\)), confirming that this difference is statistically significant. The lower volume on holidays may reflect reduced traffic due to time off from work and school.

4 Discussion

The results of this analysis confirm several intuitive but important insights into traffic accident patterns in the United States. Time-based trends clearly reveal that accidents peak during weekday rush hours and are far less frequent during early morning hours and weekends. December’s heightened accident volume suggests seasonal effects such as holiday travel and poor weather conditions play a significant role.

Environmental factors such as temperature and precipitation demonstrated limited but interesting relationships with accident severity and frequency. While accidents were least common during extreme temperatures, those that occurred under these conditions tended to be more severe. Weather conditions like “Overcast” and “Scattered Clouds” were associated with higher average severity, possibly reflecting poor visibility or driver overconfidence in seemingly stable weather.

Geographic trends uncovered population-adjusted hotspots for traffic incidents. States like South Carolina and Louisiana exhibited the highest accident rates per 100,000 residents, yet they were not among the worst in terms of average severity. This distinction could point to differences in reporting practices, infrastructure quality, or emergency response times across states.

Finally, our holiday-based t-tests revealed that although fewer accidents occur on holidays, those that do are not significantly more severe. This suggests that reduced traffic volume likely offsets any increased risk associated with holiday distractions or celebrations.

5 Conclusion

This exploratory data visualization and statistical analysis of U.S. traffic accidents from 2016 to 2023 provides a multi-dimensional view of when, where, and under what conditions accidents are most likely to occur.

The findings emphasize: - The importance of targeted safety efforts during weekday commute hours - Seasonal and weather-related influences on accident frequency and severity - Geographic disparities in accident rates that may warrant region-specific interventions - That holidays may not be inherently more dangerous, but still merit focused traffic safety messaging due to lower yet impactful accident rates

This project underscores the power of combining time-series analysis, geospatial mapping, and statistical testing to support data-driven transportation planning and public safety strategy. Future research could expand by incorporating traffic volume data, urban/rural context, or vehicle-specific information to deepen these insights.